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Abstract 

In this paper, we propose an Advanced Bayesian-based Personalized Laboratory Tests recommendation (BPLT"^) 
model. Given a patient, we estimate whether a new laboratory test should belong to a "taken" or "not-taken" class. 
We use the bayesian method to build a weighting function for a laboratory test and the given patient. A higher 
weight represents that the laboratory test has a higher probability of being "taken" by the patient and lower 
probability of being "not-taken" by the patient. For the sake of effectiveness and robustness, we further integrate 
several modified smoothing techniques into the model. In order to evaluate BPLT^ model objectively, we propose 
a framework where the data set is randomly split into a training set, a validation input set and a validation label 
set. A training matrix is generated from the training data set. Then instead of accessing the training data set 
repeatedly, we utilize this training matrix to predict the laboratory test on the validation input set. Finally, the 
recommended ranking list is compared with the validation label set using our proposed metric CorrectRoteM- We 
conduct experiments on real medical data, and the experimental results show the effectiveness of the proposed 
BPLT+ model. 



Background 

Large amounts of clinic laboratory test data are col- 
lected and stored every day. Therefore, there is an 
increasing need for analyzing and utilizing the labora- 
tory test data. The problem we are working on in this 
paper is to recommend laboratory tests for given 
patients. Health care recommendation problems have 
drawn researchers' attention for years. However, there 
are not a lot of studies conducted on the clinic labora- 
tory test recommendation problem. 

The medical data we are working on contains several 
years patients' laboratory test records. Figure 1 shows an 
example of the data format. Formally, the laboratory test 
prediction problem can be described as follows [1]: 
"Given a set of patients P = {pi, pj and a set of 

laboratory tests T = {testi, test2, ... testM}, each patient pj 
has done tests testj^i, testjj^j. If a doctor would like to 
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assign a new test for patient pp which test in T should 
be chosen?" 

The computer systems have been playing for an 
important role in health care for years [2-8]. Statistic 
algorithms [9-12] lead an important role in investigating 
health care data. [13,14] extracts chemical keywords 
from a query patent by analyzing word frequency and 
the word's effect over the data collection. Bayesian 
learning is a widely used algorithm that shows good per- 
formance [15-19]. A semantic-based association rule 
mining approach is proposed to model the medical 
query contexts in [20]. Using a novel classifier based on 
the Bayesian discriminant function, Raymer, M. L. [21] 
present a hybrid algorithm that employs feature selec- 
tion and extraction to isolate salient features from large 
medical and other biological data sets. Martin and Perez 
[22] analyze the robustness of the optimal action in a 
Bayesian decision making problem in the context of 
health care. [23,24] studies the association between two 
words by simulating the impact of words in documents 
in the context of information retrieval. A probabilistic 
survival model is derived from the survival analysis 
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Figure 1 An example dataset. The format of the laboratory data sets is presented: the attributes from left to right are SDTE (SERVICE DATE), 
REQ# (REQUISITION NUMBER), PNUM (PATIENT HEALTH CARD#), PNAM (PATIENT NAME), PSEX (PATIENT SEX), BDTE (PATIENT DATE OF BIRTH), 
TSEQ (TEST SEQUENCE NUMBER), TEST aEST CODE), DESC aEST DESCRIPTION), RSLT aEST RESULT), NORM (NORMAL RANGE), REXP (RESULT 
EXPECTED Y/N), EXRS (EXTENDED RESULT Y/N). The patient information in this table is fake due to privacy. 



theory for measuring aspect novelty of genomics data 
[25]. A mixture markov model is proposed to investigate 
user navigation patterns so that a personalized recom- 
mendation system for each user can be built [26]. In 
our previous work [1], we propose a laboratory test pre- 
diction model, which would objectively determine 
whether a laboratory test is associated to a patient. This 
paper is a significant extension to [1]. 

Smoothing [27] is a technique to create an approxi- 
mating function that attempts to capture important pat- 
terns in the data, while leaving out noise or other fine- 
scale structures/rapid phenomena. The smoothing tech- 
niques have been used in many realms to improve the 
accuracy [28]. Based on the basic Bayesian algorithm 
and smoothing techniques, we propose an Advanced 
Bayesian-based Personalized Laboratory Tests recom- 
mendation (BPLT"^) model, to investigate the correlation 



among laboratory tests for each patient. Evaluation is a 
crucial issue in the health care domain [29]. Some pre- 
vious health care researchers do evaluation via patient 
interaction [30] or statistics [31]. We present a metric 
CorrectRatex by employing the idea of Mean Average 
Precision (MAP) [32] in Information Retrieval domain. 

Four unique contributions are presented in this paper. 
Firstly, we learn the associations among laboratory tests 
and make personalized recommendations to patients 
without human interaction. Secondly, we integrate mod- 
ified smoothing technologies to improve the persona- 
lized recommendation model and propose the BPLT+ 
model. Thirdly, we propose a framework to randomly 
generate a training data set, a validation input set and a 
validation label set. Fourth, we use a objective evaluation 
metric for personalized recommendation systems with- 
out patient interaction. 
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Methods 

Bayesian-Based personalized laboratory tests 
recommendation (BPLT) model 

Here we assume that the laboratory tests for a patient 
have associations among each other. For instance, if a 
patient is suspected to have diabetes, usually the doctor 
will assign both Hemoglobin test and Glucose Fasting 
test for this patient. We can see that there exists an 
association between Hemoglobin and Glucose Fasting 
with respect to some hidden information, diabetes in 
this case. On the other hand, if a patient is assigned 
Hemoglobin test, then it is very likely that this patient 
should also take Glucose Fasting test. In this section, we 
build a model for learning the associations of the labora- 
tory tests, inferring the associations between patients 
and laboratory tests, and therefore recommending new 
laboratory tests to the patients. We regard the test 
recommendation problem as a special classification pro- 
blem, where a test belongs to either a "taken" or "not- 
taken" class. We use Bayesian classifier as our basic clas- 
sifier, and modify it to a personalized ranking model. 
Basic concept: Bayesian classifier 

A classification problem is the following [33]: given a set 
of training instances, each described with a set of n 
attributes and each belonging to exactly one of a certain 
number of possible classes, learn to classify new, unseen 
objects. In addition, each attribute has a fixed number 
of possible values. We use naive Bayesian classifier as 
our basic classifier in this paper, since it evaluates 
directly the probability of taking a test and the condi- 
tional probability among two tests. Moreover, naive 
Bayesian is easy to construct and has surprisingly good 
performance in classification, even though the condi- 
tional independence assumption is rarely true in real- 
world applications [34]. The probability model for a 
classifier is a conditional model 



Therefore, the probability of a class C given feature Fi, 
.., is 



Pr(C|Fi,...,F„) 



(1) 



where Fi, F„ are attributes, and C is a class variable. 
By Bayesian criteria, it equals to 



Pr(C)Pr(Fi F„|C) 

Pr(Fi F„) 



(2) 



The denominator is effectively constant, and the 
numerator is equivalent to the joint probability model 

Pr(C, Fi,...,F„) 

= Pr(C)Pr(Fi|C)Pr(F2|a F,)PviF,\C, F,, F2)...Pr(F„|C, Fi F„_i) 

In naive Bayesian, it assumes the features are condi- 
tional independent 

Pv{Fi\C, F^,o) = Pr(F,|C),/ortVj 



Pr(C|Fi,...,F„)=APr(C)fj(F,|C) 



(3) 



where A = 



Pr(Fi Fn) 



1=1 



is a constant. 



The weighting function of BPLT model 

In this Section, we describe the Bayesian-based Persona- 
lized Laboratory Tests recommendation (BPLT) model, 
which was proposed in our previous work [1]. More 
details are given in this paper. The purpose of BPLT 
model is to classify the laboratory tests for individual 
patients by their personal conditions. In the real world, 
it is often easier to obtain the patients' previous labora- 
tory tests information. Therefore, the BPLT model 
recommends additional new laboratory tests to 
patients, given the previous laboratory tests that the 
patients have taken. 

Suppose we have a set of M laboratory tests T = {testi, 
test2y testM }> and a patient pj who has taken tests 7} = 
{testpi> testjj^j } where testj^ g T for all 1 < i < kj. We 
denote the events that tests in 7} are taken by pj as Fy,i, 
Fy,2, ...F,;Af . For example, if we have 7 tests in T, and pj 
has taken test^, tests and testj could be represented as 
{Fpi^ Fp2> Fp?) = (0, 0, 1, 0, 1, 0, 1). Bayesian Classifier 
is employed to evaluate the association between pj a new 
test testo where testo g T and testo ^ I}. We use Fj^q to 
represent the event of pj should take to, and to repre- 
sent the event of pj should not take to. By Formula (3), 
the probability of Fpo given Fy,i, Fy^^ ...Fy^^ is 



Pr(F,,o|F;,i, F,- 2, . . . FjM) cx Pr(F,,o) 0 MPjMo) 
The probability of F^^q given Fy,i, Fy,2, ... F^m is 

M 

PT{F^^,\Fj,,,Fj,2, . . . F^,m) a Pr(F^,o) 0 P''(^i.'l^o) 

In the BPLT model, we reward the tests with high 
probability of "taken" and low probability of "not-taken". 
The correlation between a new test testo and a given 
patient pj is shown in Definition 1 [1]. 

Definition 1 The correlation between a new test testo 
and a given patient pj is defined as the log function of 
the probability of pj should take test o divided by the 
probability of pj should not take test o given F^i , Fy,2, ... 

Pr(F^,o|F;,i,F^,2,...F,,Ad) 
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We can see that higher value of corr{testo, pj) indicates 
that testQ has more association with pj. The calculation 
of corr{testo, pj) can be further simplified as follows 

corr(testQ,pj) 

= logPr(F;,o|F;,i, F;,2, . . . FjM) - \og?r{F^Q\Fj,i,Fj,2, . . . Fj^) 

M M 

= log Pr(F,.o) n Pr(f,,,|F,,o) - log Pr(i^o) fl Pr(fp|f/,o) (5) 



= log 



Pr(Fj,o) ^ logPr(Fj,,|Fj,o) 



Moreover, a test either belongs to a "taken" class or a 
'not taken" class. Thus, the following two formulas are 
held. 

Pr(F;,o) + Pr{i^,o)=l 
Pr(F;,dF,,o) Pr(F;,o) + Pr(F;,i|F^o) Pr(F^o) = Hhi) 
from which we can obtain Pt{F^q) and Pv{Fji\F^Q) 

Pr(f^J = 1 - Pr(Fj,o) 

Pr(F,-,0-Pr(F,,|Fj,o)Pr(fi,o) 



P<Pi.i\Pj.o) 



1 - Pr(F;.o) 



Thus Pr(F-o) and Pr(Fj,i|F*;o) in (5) can be eliminated 
in corr {testo, pj ), as shown below 



1 P^(^^o) V^i 



Pr(fi,|F,,o)(l-Pr(F,,o)) 



l-Pr(F^,o) tr Pr(Fi.) - Pr(F;,dF;,o) - Pr(F;,o) 

A joint probability for patient pj take both of the tests 
testi and testo is 

Pr(F;,,F^,o) = Pr(F;,dF;,o)Pr(F;^ 

The definition of the correlation between testo and pj 



IS 



corr {testo, pj) 

, Pr(fj,o) ^, Pr(F;,„F^,o)(l-Pr(F;,o)) 



1 - Pr(F;,o) 



Pr(F,,o)(Pr(F,„)-Pr(F,.i,F;,o)) 



Pr(Fj„|Fj,o) 



Pr(F,,,) - Pr(F;,i|F^,o) 



which leads to the following Definition 2 [1]. 

Definition 2 The weighting function for a laboratory 
test testo for ^ patient pj is the simplified correlation 
between testo ^^'^ Pj 



w{testQ, Pj) = (fe - 1) . log - — - + V] log (6) 



1=1 



yi,i - PjA 



where 
a = Pr(F^,o) = 
Pr(F;.) 



n 



number of patients taken testo 

number of patients 
number of patients that Fjj holds 
number of patients 

^,, = Pr(F,,|F,,o)= p^^^^^^ 

1 number of patients that both Fj,o and Fj^i holds 
a number of patients 

The new laboratory tests will be ranked in a list 
according to w{testo, Pj ) for a given patient pj. In the 
later section, we will present the evaluation environ- 
ments for the laboratory test ranking list. 

An advanced model: BPLT^ 

To have a more robust and better performance model, we 
further propose an advanced model, BPLT^, by improving 
the BPLT model using several smoothing techniques. 
There are two reasons for smoothing BPLT. One reason is 
that smoothing is a way to deal with noise within the data. 
Another reason is to avoid the mathematically meaning- 
less. When test^ laboratory test has not been observed in 
the previous visits, which means a = 0, the first part of for- 
mula (6) will become an irrational number. Meanwhile, 
when the joint frequency of two laboratory tests is zero, 
which means Ppi = 0, the second part of (6) will become 
an irrational number. Therefore, we introduce smoothing 
technologies to further improve BPLT model. 
Smoothing techniques 

In statistics, smoothing [27] is a technique to create an 
approximating function that attempts to capture impor- 
tant patterns in the data, while leaving out noise or 
other fine-scale structures/rapid phenomena. The main 
purpose of smoothing in this paper is to assign a non- 
zero probability to the unseen tests and improve the 
accuracy of test probability estimation in general. 

The smoothing techniques are discussed based on the 
following definitions of a conditional probability [28]. 



Pr(t|p) = 



c{t;p) 



(7) 



where c(t;p) is the count of a patient taking a test. 
Here are some commonly used smoothing methods. 
Since we have defined a ranking problem, which is simi- 
lar to the problems in Information Retrieval (IR), we use 
some widely used smoothing methods in language 
model in IR. The general form of a smoothed model 
[35] is assumed to be the following: 
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{Pr (t I p) if test t is observed 
Pr(t|C) otherwise 



(8) 



where PY:t{t\p) is the smoothed probabiUty of a test t 
given the patient with existing tests. Pr(^|C) is the prob- 
ability of a test t given the whole data set. 

A smoothing method may be as simple as adding an 
extra count to every test, which is called additive or 
Laplace smoothing, or more sophisticated as in Katz 
smoothing, where tests of different count are treated dif- 
ferently. Three representative methods that are popular 
and effective are: 

♦ The Jelinek-Mercer method 



Pr(t|p) = (1 -A.)Pr(t|p) + XPr(t|C) 



(9) 



where A is a balancing parameter ranges from 0 to 1. 
♦ Bayesian Smoothing using Dirichlet Priors 



c(t;p) + /xo Pr(t|C) 
^^^^\P) = — TTir^ 



(10) 



where piQ is a balancing parameter, and fiQ >0. The 
Laplace method is a special case of this technique. 
♦ Absolute Discounting 



Pr(tlp) 

0 



max{c{t;p) —8,0) 



crp{t\C) 



(11) 



where S g [0, 1] is a discount constant and a = S\p\y\ 
^|, so that all probabilities sum to one. Here \p\u is the 
number of unique terms in document d, and \p\ is the 
total count of words in the documents. 
BPLT^ with smoothing techniques 

There are two parts in formula (6) that need smoothing. 
The first one is the conditional probability Ppi = Pr(Fy,/| 
Fy^o)- Its smoothed format is as follows: 
♦ BPLT^ with Jelinek-Mercer 



fi^^, = {l-X)Pj,i^Xyj,i 

> BPLT"^ with dirichlet priors 

> BPLT"^ with absolute discounting 



hi 



max{c{t; p) — S, 0) 

EteTC{t;P) 



+ SVj.i 



(12) 



(13) 



(14) 



In Jelinek-Mercer BPLT^ and Absolute Discounting 
BPLT^, we use the existing smoothing method. The 
smoothing parameters A, S are within the range of [0, 
1]. In Dirichlet Priors BPLT^, we modify the Dirichlet 
smoothing technique, by divide both the numerator and 



the denominator in (10) by ^^^^j^i^'P)> normalize 
the parameter to the range of 0[1], where 

Another part in formula (6) needs smoothing is 
logj-r^j which is a simple division that could be 
smoothed 

via Laplace smoothing as 



a+0 

where ^ is a tuning parameter ranges from 0 to 1. 



(15) 



Evaluation environments 

Datasets 

The datasets in our experiment are obtained from Alpha 
Global IT [1,36]. Alpha Corporate Group provides 
laboratory, medical clinic, commercial electronic medical 
record and practice management software. The data set 
contains 78 monthly patient's laboratory test results. 
Our experiments use 6 month results, containing 
1,048,575 patients' records, as a key study. Thousands of 
patients' records and more than 400 laboratory tests are 
included in our experiments. The data format is the 
same as the example shown in Figure 1. Our data set 
contains real patients' information, such as health card 
ID, age, gender, date of visit, laboratory test ID, labora- 
tory test results. We only use the patient ID and labora- 
tory ID attributes in this paper, and analyze the 
associations among these laboratory tests. In our future 
work, we will incorporate more attributes in the labora- 
tory recommendation model. 

Validation data and measure 

To evaluate BPLT^ models objectively, we divide the 
data set into three components: a training set, a valida- 
tion input set, and a validation label set. The data set is 
firstly randomly split into a training set and a validation 
set. In this step, we split based on the patients and do 
not split the records from a same patient. Then for the 
validation set, we randomly remove one test f from 
each patient pj, and store the f in the validation label 
set. The ranked list returned by BPLT^ will be com- 
pared with t for each patient. To measure such compar- 
ison and finally evaluate the effectiveness of BPLT"^, we 
use the following defined CorrectRatCx [1]. Suppose the 
returned laboratory ranking list is L = t[ -, . . . t[ .^ Correc- 
tRatex validates whether t appears in the top ranked 
tests. The measure is modified from Mean Average Pre- 
cision (MAP) [32] evaluation metric. 

Definition 3 The CorrectRatex evaluates the accuracy 
of a laboratory tests prediction system. It is the number 
of patients with the desired (golden standard) test 
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matching one of the top X tests generated by the system^ 
divided by the total number of the patients, 

y^'^ TOP' 

CorrectRatex = — (16) 

n 

where 

jQp. _ 1 1 ^* ^(^tches a test in {t'lj, . . . fx,j] 
^'^ \ 0 otherwise 

n is the number of patients, X is a parameter indicat- 
ing how many top tests are compared to the golden stan- 
dard test f\ which is set to be 1 or 3 in this paper. 

We present an example to show how the CorrectRatex 
evaluates the model in Table 1. Suppose the laboratory 
test sets includes 200 tests and there are 5 patients in the 
validation set. As we have introduced, the BPLT^ model 
returns a ranked list for each patient. Here ">" represents 
that the weight of the left-side laboratory test is higher 
than the weight of the right-side laboratory test. In our 
example, 2 out of 5 patients have the desired test 
t ranked in the top 1 position of the list, then Correc- 
tRatex equals 0.4. And 4 out of 5 patients have i appears 
within the top 3 positions of the returned ranking list, 
then CorrectRate^ equals 0.8. We can see that the top 
3 positions include the top 1 position, so the following 
statement is always true: CorrectRatCi < CorrectRate^, 

BPLT^ System Framework 

The framework of BPLT"^ Model is shown in Figure 2. 
The data set in this framework is abstracted to contain 
only patient ID and laboratory test ID. The procedures 
in the proposed framework are described as follows. 

♦ Split: First the data set is randomly split into a train- 
ing set and a validation set. 

♦ Random Remove a test as label: Since it is hard to 
objectively evaluate the performance of the BPLT^ model, 
we further randomly remove a test for each visit of the 
patients from the validation set. These removed tests are 
regarded as labels of the validation set input. Our ultimate 
goal is to recommend the missing test for a patient's visit. 

♦ Build training matrix: To avoid duplicate calculat- 
ing the frequency of a test and the joint frequency 



between two tests, we build a training matrix out of the 
training data. This training matrix contains the fre- 
quency of co-occurrences of two laboratory tests. For 
example, if a patient in the training data did testi and 
test2 together, then add 1 to F12 and ^21- We can see 
that the training matrix is a symmetric matrix. 

♦ BPLT"^ model: The correlation of a given testo and a 
patient is calculated based on formula (6). 

♦ Evaluation via CorrectRatex: Finally, the evaluation 
criteria CorrectRatex evaluates if the model made the 
correct recommendations. 

Results 

We first show the overall performance under different 
training-validation proportion in Table 2[1]. We ran- 
domly take 40%, 50% and 60% of the data out of the raw 
data set as the training data and keep the rest as the vali- 
dation data. In general, there is higher performance of 
BPLT^ model on a larger training data set. This is because 
the larger training data set contains more information, 
and more knowledge can be learned. With the develop- 
ment of computer technology, larger amount of medical 
data will be available in practice. Therefore, we will use 
60% of data as training data in the rest of this paper. As 
we have discussed before, CorrectRate^ is always higher 
than CorrectRatCi, In general, the BPLT"^ model has 
promising performance with an accuracy of 0.7074 for 
CorrectRatex and an accuracy of 0.7840 for CorrectRate^, 

Then we investigate how the smoothing parameters 
affect the effectiveness in detail. We first consider 
smoothing Pj^ only. There are three smoothing technol- 
ogies utilized to smooth Pjj, They are Jelinek-Mercer 
BPLT^, Dirichlet Priors BPLT^ and Absolute Discount- 
ing BPLT"^, with the corresponding parameters: A, S 
G [0, 1]. We conduct experiments on these three meth- 
ods individually. The change of CorrectRatex and Cor- 
rectRatex with respect to the parameters are shown in 
Figure 3, Figure 4, and Figure 5. We can see from the 
figures that the curve of CorrectRatex is always below 
the curve of CorrectRate^, which is consistent as we 
have discussed Definition 3. With the increasing of para- 
meters from 0.1 to 1, both CorrectRatex and Correc- 
tRatex become higher at the beginning due to the 



Table 1 An example of CorrectRatex 






Recommendation list 


X= 1 


X=3 


Pi 


testi 04 


testi 04 > tests > test4o > ... 


TOP,, = 1 


TOP,, 3 = 1 


Pi 


testso 


testso > tests > testis > - 


7OP2.1 = 1 


70^2,3 = 1 


P3 


test2 


testgs > test2 > test34 > ... 


7OP3.1 = 0 


7-0^3,3 = 1 


Pa 


testgs 


testys > testi 9 > testss > - 


70^4,1 = 0 


70/^4,3 = 0 


Ps 


testi 98 


test92 > testi 34 > testi 98 > - 


70Ps,i = 0 


T0Ps,3 = 1 


All patients 






CorrectRote, = 0.4 


CorrectRate^ = 0.8 



This example contains a validation set of 5 patients, their desired laboratory test f^ the recommendation list, and the corresponding evaluation results. 
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Figure 2 BPLT"^ System Framework. The procedures for processing the laboratory data and testing the BPLT^ model are shown: (1) the 
rectangles represent the data sets; (2) the rounded rectangles present the implemented procedures; (3) the ovals show the personalized 
laboratory model; (4) the lines with arrows determine the directions through the framework. 



incorporating of the smoothing portion. After reaching 
the maximum value, CorrectRatei and CorrectRate^ 
become lower, since the weighing would tend to be 
more universal when too much smoothing is incorpo- 
rated. All the smoothing parameters achieve their best 
performance at the value of 0.2. Comparing among 
these three methods, Jelinek-Mercer BPLT^ obtains the 
best performance on both CorrectRatei and Correc- 
tRate^y which are 0.5569 and 0.6167. When it comes to 
the average value, Dirichlet Priors BPLT"^'s average per- 
formance on CorrectRate^ is better than the other two, 
and Jelinek-Mercer BPLT"^'s average performance on 
CorrectRatei is the best. 

We further discuss to smooth the second part of (6), 
where the Laplace smoothing parameter is 6, As we have 
discussed before, Jelinek-Mercer BPLT^ has the best per- 
formance on both CorrectRatei and CorrectRate^, We 
focus on investigating the sensitivity of 6 by fixing 

Table 2 Performance 



Percentage of Training Data 


CorrectRate^ 


CorrectRates 


60% 


0.7074 


0.7840 


50% 


0.6962 


0.7837 


40% 


0.6823 


0.7821 



The overall performance of BPLT"^ with different training-validation 
proportions. 



0.7 



0.6 



0.5 



^0.4 

o 
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0.2 



0.1 



CorrectRatei 
CorrectRateS 




0.2 



0.4 



0.6 



0.8 



1 



Figure 3 Parameter Sensitivity of A in Jelinek-Mercer BPLT"^. 

The influence of parameter A is investigated: (1) tine stars represent 
tine performance of Jelinel<-Mercer BPL^ under tine evaluation 
metric CorrectRotey, (2) the circles represent the performance of 
Jelinek-Mercer BPLT^ under the evaluation metric CorrectRote^; 
(3) CorrectRotes is always higher than CorrectRotey, (4) Jelinek-Mercer 
BPL^ achieves its best performance when A = 0.2. 
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Figure 4 Parameter Sensitivity of ju in Dirichlet Priors BPLT"^. 

The influence of parameter jj is studied: (1) the stars represent the 
performance of Dirichlet Priors BPL^ under the evaluation metric 
CorrectRotey, (2) the circles represent the performance of Dirichlet 
Priors BPL^ under the evaluation metric CorrectRote^; (3) Dirichlet 
Priors BPL^ achieves its best performance when jj = 0.2. 
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Figure 5 Parameter Sensitivity of S in Absolute Discounting 

BPLT"^. The influence of parameter S is investigated: (1) the stars 
represent the performance of Absolute Discounting BPLT^ under 
the evaluation metric CorrectRotey, (2) the circles represent the 
performance of Absolute Discounting BPL^ under the evaluation 
metric CorrectRate^; (3) Absolute Discounting BPLT^ achieves its best 
performance when S = 0.2. 



Jelinek-Mercer BPLT^ with A = 0.2. The results are 
shown in Figure 6. We can see that the CorrectRatei 
increases while 6 is increasing, and the CorrectRate^ 
decreases a little and then increases. Both of them reach 
the maximum and tend to be stable when 6 is greater 
than 0.5. 

Conclusions and future work 

An Advanced Bayesian based Personalized Laboratory 
Tests recommendation (BPLT"^) model is proposed in 
this paper. Based on the assumption that hidden asso- 
ciation could exist among laboratory tests, we employ a 
Bayesian approach to build a weighting function for 
scoring the correlation between a new laboratory test 
and a patient. To have a more robust and better perfor- 
mance model, we employ several enhanced smoothing 
technologies into the BPLT"^ model. The main purpose 
of smoothing in this paper is to assign a non-zero prob- 
ability to the unseen laboratory tests and improve the 
accuracy of test probability estimation. We integrate 
existing smoothing techniques in the BPLT^ model. In 
particular, we use three techniques, Jelinek-Mercer, 
Dirichlet Priors and Absolute Discounting approaches, 
to smooth the conditional probability of observing a 
patient taking an existing test when a new test testQ is 
given (Formula 12-14). Also we use Laplace method to 
smooth the log function in the BPLT^ model (Formula 
15). We conducted experiments to discuss the per- 
formance of the BPLT^ model and the sensitivity of 
smoothing parameters. We find that BPLT^ is able to 
make accurate recommendations under proper smooth- 
ing parameters. 

Further, we propose a novel framework for effectively 
implementing BPLT"^ model and objectively testing per- 
sonalized recommendation systems without human 
interactions, shown in Figure 2. Based on the real 
patients' laboratory test data, we randomly generate a 
training data set, a validation input set and a validation 
label set. A training matrix containing the laboratory 
test statistics is calculated from the training data set and 
stored. For new patients (the validation input set), 
instead of processing the original training set, we utilize 
this training matrix to predict the laboratory test on the 
validation input set, and compare the ranking results 
with the validation label set. 

There are a few future directions of this research 
work. As we can see from the data format in Figure 1, 
we have not make use of all the attributes. In the future, 
we would like to conduct a comprehensive investigation 
for the patients' profiles. For example, we can cluster 
the patients into groups and investigate the similarities 
of the patients in the same group. We can also analyze 
the associations among laboratory test results and there- 
fore further enhance our proposed personalized 
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Figure 6 Parameter Sensitivity of 6. The influence of parameter 6 is presented: (1) we use the best smoothing technique for the first part in 
Formula 6, which is Jelinek-Mercer BPL^; (2) the smoothing parameter A is set to be optimal; (2) the stars represent the results of Jelinek- 
Mercer BPLT^ under evaluation metric CorrectRotey, (3) the circles represent the results of Jelinek-Mercer BPLT^ under evaluation metric 
CorrectRotes; (4) both metrics reach the maximum and tend to be stable when 6 is greater than 0.5. 



recommendation model. Moreover, we look forward to 
testing our proposed models in more real applications. 
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